A Discriminative Hierarchical Model for Fast Coreference at Large Scale

نویسندگان

  • Michael L. Wick
  • Sameer Singh
  • Andrew McCallum
چکیده

Methods that measure compatibility between mention pairs are currently the dominant approach to coreference. However, they suffer from a number of drawbacks including difficulties scaling to large numbers of mentions and limited representational power. As the severity of these drawbacks continue to progress with the growing demand for more data, the need to replace the pairwise approaches with a more expressive, highly scalable alternative is becoming increasingly urgent. In this paper we propose a novel discriminative hierarchical model that recursively structures entities into trees. These trees succinctly summarize the mentions providing a highly-compact information-rich structure for reasoning about entities and coreference uncertainty at small, large, and massive scales. The unique recursive structure of our entities allows our model to adapt to entities of various sizes, express features over entity hierarchies, and scale to massive data, making our approach a desirable new standard to replace the antiquated pairwise model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large-scale author coreference via hierarchical entity representations

Large-scale author coreference, the problem of ascribing research papers to real-world authors in bibliographic databases, is critical for mining the scientific community. However, traditional pairwise approaches, which measure coreference similarity between pairs of author mentions, scale poorly to large databases; and streaming approaches, which lack the ability to retroactively correct error...

متن کامل

A Discriminative Latent Variable Model for Clustering of Streaming Data with Application to Coreference Resolution

We present a latent variable structured prediction model, called the Latent Left-linking Model (L3M), for discriminative supervised clustering of items that follow a streaming order. LM admits efficient inference and we present a learning framework for LM that smoothly interpolates between latent structural SVMs and hidden variable CRFs. We present a fast stochastic gradientbased learning techn...

متن کامل

Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses par...

متن کامل

Distantly Labeling Data for Large Scale Cross-Document Coreference

Cross-document coreference, the problem of resolving entity mentions across multi-document collections, is crucial to automated knowledge base construction and data mining tasks. However, the scarcity of large labeled data sets has hindered supervised machine learning research for this task. In this paper we develop and demonstrate an approach based on “distantly-labeling” a data set from which...

متن کامل

A Discriminative Latent Variable Model for Online Clustering

This paper presents a latent variable structured prediction model for discriminative supervised clustering of items called the Latent Left-linking Model (LM). We present an online clustering algorithm for LM based on a feature-based item similarity function. We provide a learning framework for estimating the similarity function and present a fast stochastic gradient-based learning technique. In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012